Skip to content

feat: add npu_recurrent_gated_delta_rule and chunk_gated_delta_rule fusion operators for qwen3.5/qwen3-next.#1262

Open
fems14 wants to merge 1 commit intojd-opensource:mainfrom
fems14:delat_net
Open

feat: add npu_recurrent_gated_delta_rule and chunk_gated_delta_rule fusion operators for qwen3.5/qwen3-next.#1262
fems14 wants to merge 1 commit intojd-opensource:mainfrom
fems14:delat_net

Conversation

@fems14
Copy link
Copy Markdown
Contributor

@fems14 fems14 commented Apr 11, 2026

qwe3.5/qwen3-next model add npu_recurrent_gated_delta_rule and chunk_gated_delta_rule fusion operater
需依赖算子先合入:https://gitcode.com/xLLM-AI/torch_npu_ops/pull/12

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Qwen3GatedDeltaNetBase implementation to utilize new NPU-specific kernels for chunked and recurrent gated-delta attention and adds cumulative sequence length calculation to the attention metadata. However, several critical issues need to be addressed: the initial state fetched from the SSM cache is being incorrectly zeroed out, which breaks statefulness; the recurrent state is transposed before being stored back in the cache, leading to a layout mismatch; and the removal of head repetition logic combined with the use of squeeze(0) on tensors could result in shape and dimension errors.

Comment thread xllm/core/layers/npu_torch/qwen3_gated_delta_net_base.cpp
Comment thread xllm/core/layers/npu_torch/qwen3_gated_delta_net_base.cpp
Comment thread xllm/core/layers/npu_torch/qwen3_gated_delta_net_base.cpp Outdated
Comment thread xllm/core/layers/npu_torch/qwen3_gated_delta_net_base.cpp Outdated
@yingxudeng yingxudeng changed the title qwe3.5/qwen3-next model add npu_recurrent_gated_delta_rule and chunk_gated_delta_rule fusion operater feat: add npu_recurrent_gated_delta_rule and chunk_gated_delta_rule fusion operators for qwen3.5/qwen3-next. Apr 11, 2026
@yingxudeng yingxudeng marked this pull request as draft April 11, 2026 10:35
@Vectorwh Vectorwh force-pushed the delat_net branch 2 times, most recently from 54fd624 to 44ca51d Compare April 14, 2026 12:26
Comment thread xllm/core/layers/npu_torch/qwen3_gated_delta_net_base.cpp Outdated
Comment thread xllm/core/layers/common/attention_metadata_builder.cpp
Comment thread xllm/core/layers/npu_torch/qwen3_gated_delta_net_base.cpp Outdated
Comment thread xllm/core/layers/npu_torch/qwen3_gated_delta_net_base.cpp Outdated
Comment thread xllm/core/layers/npu_torch/qwen3_gated_delta_net_base.cpp
Comment thread xllm/core/layers/npu_torch/qwen3_gated_delta_net_base.cpp Outdated
Comment thread xllm/core/layers/npu_torch/qwen3_gated_delta_net_base.cpp Outdated
@yingxudeng yingxudeng marked this pull request as ready for review April 20, 2026 12:05
zhang-minchao
zhang-minchao previously approved these changes Apr 20, 2026
yingxudeng
yingxudeng previously approved these changes Apr 20, 2026

#include <tuple>

#include "torch_npu/csrc/aten/CustomFunctions.h"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个改动有必要吗? 如果cicd过的话,就这样吧。
如果cicd没过还要改,这个是不是可以删掉

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

@maojunx99 maojunx99 dismissed stale reviews from zhang-minchao and yingxudeng via 3616aaa April 21, 2026 08:26
@maojunx99 maojunx99 force-pushed the delat_net branch 2 times, most recently from 3616aaa to a21c06b Compare April 21, 2026 08:30
JimHsiung
JimHsiung previously approved these changes Apr 21, 2026
…le operations on NPU

The main changes include:
1. Add the implementation file npu_recurrent_gated_delta_rule.cpp
2. Add function declarations in npu_ops_api.h
3. Add a generic interface in ops_api.h and ops_api.cpp
4. Update CMakeLists.txt to include the new source files
5. Integrate new operations in qwen3_gated_delta_net_base.cpp
6. Update submodule version
@yingxudeng
Copy link
Copy Markdown
Collaborator

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants